These instructions are essential, so please read them all carefully.
Submit your homework on your GitHub page as the RMarkdown (.Rmd) and HTML files.
Please answer the question prompt and show your code (inline). That is, all your code should be visible in the knitted chunks.
To complete this homework, you may write in the HW3.Rmd file.
(It is recommended to complete this homework in R Studio, where clicking the
Knit button would knit your homework.)
(Note: This homework is substantially more challenging than Homework 2 since this homework is much more open-ended. If you feel stuck in this homework, feel free to consult ChatGPT. ChatGPT can give you strategies for tackling a problem, code on how to tackle the problem, and an explanation of how each step of the code works. Try using ChatGPT as part of your workflow. This homework will get you to realize that writing “code that gets the job done” (while important) is only a tiny part of what it means to “be a strong coder.”)
Note: Most of your R code for this homework will not be in this homework.
You mainly write .R files inside the R or tests folders.
Therefore, you only need to show a little code inside this R Markdown file.
You only need to write things inside this R Markdown file in the questions that explicitly ask you to do so.
Intent: The intent of this question is: 1) to give you practice putting
R functions into your UWBiost561 package, and 2) to help you create ways to test your function in Question 3 below meaningfully.
Recall the generate_random_graph() function that you used in Homework 2.
source("https://raw.githubusercontent.com/linnykos/561_s2024_public/main/HW2_files/random_graph_functions.R")
generate_random_graph
#> function(n,
#> clique_fraction = 0.2,
#> density_low = 0.1){
#> # Check all the arguments are correct
#> stopifnot(n %% 1 == 0, n >= 0,
#> clique_fraction >= 0, clique_fraction <= 1,
#> density_low >= 0, density_low <= 1)
#>
#> # Generate an unsymmetric matrix
#> adj_mat <- matrix(sample(x = c(0,1),
#> size = n^2,
#> prob = c(1-density_low, density_low),
#> replace = TRUE),
#> nrow = n, ncol = n)
#>
#> # Symmetrize the matrix
#> adj_mat <- adj_mat + t(adj_mat)
#> adj_mat[adj_mat > 0] <- 1
#> diag(adj_mat) <- 1
#>
#> # Form the clique
#> clique_size <- ceiling(n * clique_fraction)
#> adj_mat[1:clique_size, 1:clique_size] <- 1
#>
#> # Randomize the order of the nodes
#> sample_idx <- sample(1:n)
#> adj_mat <- adj_mat[sample_idx, sample_idx]
#>
#> # Compute the appropriate reverse order
#> rev_order <- sapply(1:n, function(i){
#> which(sample_idx == i)
#> })
#> # To see what happens, try: adj_mat[rev_order, rev_order]
#>
#> return(list(adj_mat = adj_mat,
#> rev_order = rev_order))
#> }
Your (deliberately open-ended) goal in this question is to design a function
generate_partial_clique() that generates a random adjacency matrix
with a large partial clique, not necessarily a (fully connected) clique.
Question 1A: Create a function generate_partial_clique()
inside your UWBiost561 package inside a file called generate_partial_clique.R
inside your R folder.
(There is nothing to report for this question. Your code will be in the R folder, not
in this R Markdown file.)
Specifications on your function overall:
generate_partial_clique.generate_partial_clique.R in the R folder within your UWBiost561 package.generate_partial_clique() depends on, all your additional functions must also be in generate_partial_clique.R. That is,
generate_partial_clique.R should be “self-contained.”Specifications of your function inputs:
generate_partial_clique() must be called n, clique_fraction, and clique_edge_density respectively.n argument is a positive integer. This argument represents the number of nodes in the graph (in a literal sense, the number of rows and columns in your outputted adjacency matrix).clique_fraction argument is a single numeric between 0 and 1 (inclusive). This argument represents the fraction of nodes
(of the n nodes) that are part of the partial clique. (That is, at least
round(n*clique_fraction) nodes should be part of a partial clique.)clique_edge_density argument is a single numeric between 0 and 1 (inclusive). This argument represents the edge density among the nodes in the clique. (For example, the generate_random_graph() you used in Homework 2 essentially
had clique_edge_density=1, as the adjacency matrix you worked in Homework 2 was a (fully connected) clique.) Specifically, this means if your partial clique
has m=round(n*clique_fraction) nodes, then a (fully connected) clique would have
m*(m-1)/2 (i.e., m choose 2) edges. A partial clique with edge density of
clique_edge_density would instead have at least round(clique_edge_density*m*(m-1)/2) edges among the m nodes.n, clique_fraction and clique_edge_density must have a default value. To use your function, a user should not need to set any other argument aside from n, clique_fraction and clique_edge_density.Specifications of your function outputs:
You must output a list.
adj_mat. This is the random adjacency matrix that you construct with the partial clique.
Specifically, adj_mat should be a symmetric matrix with only values 0 or 1, 2) has 1’s along its diagonal, and 3) has no row- or column-names. (By construction, it should have a partial clique consisting of at least round(clique_fraction*n) nodes with
edge density clique_edge_density.)Note: I purposely did not specify all the necessary ingredients to
create the random graph. You can design any function as long as it satisfies the
required specifications. However, as you will realize when you start testing your
function in Question 3, some ways to create graphs with partial cliques are “too easy” that they “aren’t useful” when testing your compute_maximal_partial_clique() in
Question 2.
Question 1B: Following the demo showed in Lecture 4,
create a ROxygen skeleton for the generate_partial_clique() function
inside the generate_partial_clique.R file. Write meaningful documentation for the
generate_partial_clique() function.
Importantly, it should be exported via the @export tag.
(There is nothing to report for this question. The ROxygen skeleton is not for this R Markdown file)
Question 1C: Install your R package. You can do this
using R Studio or running devtools::install() in the R console.
(There is nothing to report for this question.
You do not do this in your R Markdown file.)
You might find https://docs.google.com/document/d/103ayPYvyzXHa84YFt-cQm_hMM1Ukqo7hlUygYD6UnC0/edit?usp=sharing useful if you are having difficulty installing your UWBiost561 package.
Question 1D: In your homework’s R Markdown file, show that your generate_partial_clique() function works. Specifically, run
the following lines. (If you are copy-pasting the following R chunk,
remove the eval = FALSE tag.)
library(UWBiost561)
set.seed(0)
simulation <- UWBiost561::generate_partial_clique(
n = 10,
clique_fraction = 0.5,
clique_edge_density = 0.9
)
simulation$adj_mat
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 1 0 1 0
#> [2,] 1 1 0 1 0
#> [3,] 0 0 1 0 0
#> [4,] 1 1 0 1 0
#> [5,] 0 0 0 0 1
#> [6,] 0 0 0 0 0
#> [7,] 1 1 0 1 0
#> [8,] 0 0 0 0 0
#> [9,] 1 1 0 1 0
#> [10,] 0 0 0 0 0
#> [,6] [,7] [,8] [,9]
#> [1,] 0 1 0 1
#> [2,] 0 1 0 1
#> [3,] 0 0 0 0
#> [4,] 0 1 0 1
#> [5,] 0 0 0 0
#> [6,] 1 0 0 0
#> [7,] 0 1 0 1
#> [8,] 0 0 1 0
#> [9,] 0 1 0 1
#> [10,] 0 0 0 0
#> [,10]
#> [1,] 0
#> [2,] 0
#> [3,] 0
#> [4,] 0
#> [5,] 0
#> [6,] 0
#> [7,] 0
#> [8,] 0
#> [9,] 0
#> [10,] 1
You will get full marks if your markdown shows a code chunk that shows both this code
and the outputted matrix res$adj_mat itself. Congratulations!
You’ve just made your function in your R package!
Intent: The intent of this question is: 1) to give you practice developing a function that is more “open-ended”, and 2) to emulate a more realistic coding experience. The code you write here will play a huge role in Homework 4, where you will look at the function fellow students write and vice versa.
Question 2A: Create a function compute_maximal_partial_clique()
inside your UWBiost561 package that computes the largest partial clique
given an adjacency matrix adj_mat and a required edge density alpha.
(There is nothing to report for this question. Your code will be in the R folder, not
in this R Markdown file.)
Figure 1: Illustration of an adjacency matrix (left), where, when you reorder the nodes (right, similar to what you did in HW2), you’ll see that the first 9 nodes from a 95% partial clique.
Note:
adj_mat and alpha. (If adj_mat were the empty graph, i.e., just a diagonal matrix, then the maximal partial clique has size 1: a node is a clique with itself.)Specifications on your function overall:
compute_maximal_partial_clique.compute_maximal_partial_clique.R in your R folder within your UWBiost561 package.compute_maximal_partial_clique() depends on, all your additional functions must also be in compute_maximal_partial_clique.R. That is,
compute_maximal_partial_clique.R should be “self-contained.”adj_mat that is at most 30 by 30 (i.e., there are at most 30 nodes), your method should take 30 seconds to complete at most.Specifications of your function inputs:
compute_maximal_partial_clique() must be called adj_mat and alpha respectively.adj_mat argument is: 1) a symmetric matrix with only values 0 or 1, 2) has 1’s along its diagonal, 3) has no row- or column-names, and 4) will have between 5 to 50 rows/columns (inclusive).alpha argument is: 1) a single numeric (i.e., a length of 1), and 2) has a value between 0.5 and 1 (inclusive).adj_mat and alpha must have a default value. To use your function, a user should not need to set any other argument aside from adj_mat and alpha.Specifications of your function outputs:
You must output a list. The list must contain at least these two named elements, specifically the first and second elements:
clique_idx. This would be a numeric vector of index numbers corresponding to the nodes (i.e., values between 1 and nrow(adj_mat)) that your function deems to be in the maximum partial clique. This vector cannot have duplicate elements, must be positive integers, and the largest value cannot exceed nrow(adj_mat). Specifically, this means your function will elect adj_mat[clique_idx,clique_idx] to be the maximal partial clique, meaning: 1) if m=length(clique_idx), then (sum(adj_mat[clique_idx,clique_idx])-m)/2 >= alpha*m*(m-1)/2, which ensures that the number of edges among the nodes in clique_idx is more than (100*alpha)% of a full clique of size m, and 2) the size of clique_idx cannot (reasonably) be increased without making it not a valid partial clique at density alpha. (Remember: It is easy for you to ensure condition #1, but it is very hard for you to guarantee condition #2.)edge_density. This is the percentage of edges in adj_mat among the nodes in clique_idx. (This calculation is given in the demo code below.) Specifically, this is number between 0 and 1 (inclusive), and (if computed correctly and the length of clique_idx is larger than one) should always be greater than or equal to alpha by construction.Rules/Guidelines about using external packages:
Here is a rough skeleton of what your compute_maximal_partial_clique() might look like:
# This is NOT a good implementation.
# This is solely for demonstration.
# This lousy function simply picks a random set of nodes.
# It doesn't actually compute the valid partial clique!
compute_maximal_partial_clique <- function(adj_mat, alpha){
n <- nrow(adj_mat)
clique_idx <- sample(1:n, size = ceiling(n/2))
m <- length(clique_idx)
edge_density_numerator <- (sum(adj_mat[idx,idx]) - m)/2
edge_density_denominator <- m*(m-1)/2
edge_density <- edge_density_numerator/edge_density_denominator
return(list(clique_idx = clique_idx,
edge_density = edge_density))
}
Note: You are not expected to know many “algorithms” for this course. Hence, if you 1) understand the task and 2) are still trying to figure out how to approach this question, please consult ChatGPT. Lectures 5 and 6 will give you live demonstrations on how to “code with ChatGPT.”
If you feel overwhelmed by this question, you can use ChatGPT to help you. (This is okay as long as your function works and follows the specifications for this question – you might need to modify ChatGPT’s code.) However, I can guarantee you that ChatGPT will not give you anywhere close to an “optimal” solution to this task (in many senses of the word). This is because this problem provably does “not have a correct solution that can feasibly be used.” Hence, if you feel courageous and know more about algorithms, you can go down the rabbit hole to see how to design a better algorithm.
I will be assessing your implementation based on you demonstrating a “good faith effort,” not based whether your function outputs the “correct” maximal partial clique. You will receive full credit if your function works, provides reasonable outputs, and passes your own unit tests (in Question 3).
Question 2B: Similar to Question 1B and 1C,
create a ROxygen skeleton for the compute_maximal_partial_clique() function
inside the compute_maximal_partial_clique.R file.
Write meaningful documentation for the
compute_maximal_partial_clique() function.
Importantly, it should be exported via the @export tag. Then, install your
UWBiost561 package again.
(There is nothing to report for this question. The ROxygen skeleton is not for this R Markdown file)
Your ROxygen skeleton should include two to five sentences about how your method works.
Question 2C: Demonstrate (in this R Markdown file)
that your compute_maximal_partial_clique() works on the random
adjacency matrix outputted from your function generate_partial_clique().
Specifically, run
the following lines. (If you are copy-pasting the following R chunk,
remove the eval = FALSE tag.)
library(UWBiost561)
set.seed(0)
simulation <- UWBiost561::generate_partial_clique(
n = 10,
clique_fraction = 0.5,
clique_edge_density = 0.9
)
adj_mat <- simulation$adj_mat
res <- UWBiost561::compute_maximal_partial_clique(
adj_mat = adj_mat,
alpha = 0.9
)
res
#> $clique_idx
#> [1] 1 2 4 7 9
#>
#> $edge_density
#> [1] 1
This question is purposefully open-ended. Your goal is to simply
show that the output of your compute_maximal_partial_clique() function
based on the adjacency matrix from generate_partial_clique() is “reasonable.”
Intent: The intent of this question is to give you experience
writing unit tests inside your UWBiost561 package.
Question 3A: Following the overview in
Lecture 5, as well as additional walkthrough in https://docs.google.com/document/d/103ayPYvyzXHa84YFt-cQm_hMM1Ukqo7hlUygYD6UnC0/edit?usp=sharing, create a tests folder in your UWBiost561 package
that contains a testthat.R file (shown below) as well as testthat folder.
(There is nothing to report for this question.)
# This is what your testthat.R file should contain
library(testthat)
library(UWBiost561)
testthat::test_check("UWBiost561")
Question 3B: Write at least two unit tests for your
generate_partial_clique() function inside a file called test_generate_partial_clique.R
within the testthat folder.
(There is nothing to report for this question. Your code will be in the tests/testthat folder, not
in this R Markdown file.)
Question 3C: Writing at least five unit tests for your
compute_maximal_partial_clique() function inside a file called test_compute_maximal_partial_clique.R
within the testthat folder. Based on the “List of types of unit-tests” page in Lecture 5,
ensure each of the five unit tests is in a “different category.”
(There is nothing to report for this question. Your code will be in the tests/testthat folder, not
in this R Markdown file.)
Some pointers:
generate_partial_clique() within some of your tests for compute_maximal_partial_clique().
Since you are constructing the random adjacency matrix with a partial clique
with at least edge density of clique_edge_density of size round(n*clique_fraction),
then you should expect that for “simple” random adjacency matrices, your compute_maximal_partial_clique() can recover the partial clique that you constructed.Question 3D: Run devtools::test() for your R package UWBiost561
when you’re in your R project.
(There is nothing to report for this question. Your code will be in the R console, not
in this R Markdown file.)
Intent: The intent of this question is to experience the basics of making an R package.
Question 4A: Fix the DESCRIPTION file as needed. Importantly, this involves adding any R package dependencies your generate_partial_clique() and compute_maximal_partial_clique() depend on. (If none, you don’t need to worry about this.) You should also replace the default author, package title, and description with something more meaningful. (There is nothing to report for this question.)
Question 4B: In your R project for the UWBiost561 package, run the command usethis::use_mit_license() in the R console. (There is nothing to report for this question.)
At this point, your R package should have the following.
R folder:
generate_partial_clique.Rcompute_maximal_partial_clique.Rtests folder:
testthat.Rtestthat folder:
test_generate_partial_clique.Rtest_compute_maximal_partial_clique.Rvignettes folder:
HW1.Rmd, HW1.html, HW2.Rmd, HW2.html,
HW3.Rmd, HW3.html (or analogously named)DESCRIPTIONLICENSE.RbuildignoreQuestion 4B: Run devtools::check(). Ensure there are no errors, but it’s okay if you don’t want to fix the warnings. (devtools::check() will also automatically generate a man folder with your documentation converted from your ROxygen code and a NAMESPACE file.) It might take a while to fix all the errors
that devtools::check() complains about – I would advise using Google to determine what exactly it is unhappy about, or you can ask on the Canvas Discussion board.
(There is nothing to report for this question. Your code will be in the R console, not
in this R Markdown file.)
Question 4C: Include a few screenshots of your devtools::check()
result in this R Markdown file. (As a guideline, one screenshot should show the first 20-or-so lines of your
devtools::check() results, and a second screenshot should show the last
20-or-so lines of your results.) You can use the knitr::include_graphics() function to include
figures inside this R Markdown file.
The intent of this question is to provide “evidence” that your devtools::check()
went smoothly. You do not need to worry about what your screenshots show specifically.
Figure 2 and 3 shows the screenshots you are trying to reproduce.
Figure 2: The first screenshot you are trying to reproduce (without the Sample watermark) in Question 4C.
Figure 3: The second screenshot you are trying to reproduce (without the Sample watermark) in Question 4C.
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot1.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot2.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot3.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot4.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot5.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot6.png")
knitr::include_graphics("/Users/tatithegreat/Documents/UW/BIOST561/UWBiost561/Images/Screenshot7.png")
Question 4D: Please include the following code chunk’s result in your Markdown file. (If you are copy-pasting the following R chunk,
remove the
eval = FALSE tag.) This will be useful for me (Kevin) to assess which packages and R version you are using.
devtools::session_info()
Intent: The intent of this question is to prepare you for HW4 and the final project.
Please read: What to expect for HW4:
For HW4, I will be curating every student’s implementation of compute_maximal_partial_clique() and provide one folder that contains every student’s
(anonymized) implementation.
Then, you will be:
For this reason, I strongly encourage you to take diligence testing your code since every other student will soon be trying out your implementation on their simulated adjacency matrices.
Question 5A: Please make sure you can access this website: https://opensesame.biostat.washington.edu/. You will need to use this website to access the department servers. (More on this in Lectures 7 and 8.)
If you can, please write, “I can access OpenSesame.” If not, please write, “I cannot access OpenSesame,” and please email Kevin immediately (who will help get you access to OpenSesame).
I can access OpenSesame.
Question 5B: In a few sentences, please describe what you think
you’d like to make do for your final project. It is 100% okay for you to “reuse” R code for another course (either past or current, even from your previous institution).
Ideally, it is R code that you care about. What you write here is not committal – you can change your idea anytime. The intent of this question is to get you to start thinking about this.
For my final project I would like to work on my current research project in single cell RNA sequencing.
The full project specifications will be released in a few weeks. To give you a sense of what to expect, you’ll be making a PkgDown website that looks like https://linnykos.github.io/561_s2024_example/. Briefly, this will involve:
README page that explains the purpose of the R package.If you do not have any R project in mind, the full project specifications will
give you more specific directions on using your UWBiost561 package to satisfy
these requirements. However, this experience will feel more meaningful if you
use something you value more as the foundation for the final project.
(This project is not intended to take you a long time. Ideally, if you choose a coding project you care about as the foundation for this final project, you already have all the R code, and all that remains is “everything else.” I do not care about how many or complex your functions are. Realistically, you can use just a meaningful subset of functions in your existing code base for this project.)
Question 5B: Push all your code onto GitHub. This includes all the contents inside your R and tests folder and your DESCRIPTION, NAMESPACE, and LICENSE files. Please double-check that you can see all the necessary files online on the GitHub website (i.e., https://github.com/).
(There is nothing to report for this question.)
This “question” is an additional way for students to communicate with instructors. You could include positive feedback about topics you enjoyed learning in this module, critiques about the course difficulty/pacing, or some questions/confusions you had about course material. Your feedback can help shape the course for the rest of this quarter and future years. Please be mindful and polite when providing feedback. You may leave this question blank. -NA